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1 DONALD B. RUBIN, a witness called on 

2 behalf of the plaintiff, first having been duly 

3 sworn, on oath deposes and says as follows: 

4 

5 EXAMINATION BY MR. FERGUSON: 

6 Q. Please state your name and business 
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address. 

A. My name is Donald B. Rubin, and I'm a 
professor at Harvard University, Department of 
Statistics. And I also do consulting through my 
home, and through a corporation of which I am a 
co-owner. 

Q. I have reviewed your deposition in the 
Minnesota case, and your curriculum vitae which 
was attached to that. Have there been 
significant changes in your curriculum vitae 
since the time you gave that deposition? 

A. Not significant changes. I'm sure 
there have been a few more articles that were 
probably listed as "to be published" that have 
been published, and there are probably a couple 
more that have been accepted for publication, 
and maybe there's something like, I think, an 
Institute of Mathematics council member which I 
got elected to, but I have done it before. But 
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minor things. 

Q. No Nobel Prizes or anything like that? 

A. No. 

Q. How much are you charging for this 
deposition? 

A. Is that something I should answer? 

MR. BIERSTEKER: Is there an 
agreement in the case? 

MR. FERGUSON: We pay the experts' 
fees for their depositions. 
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MR. BIERSTEKER: You mean the person 
taking it does? 

MR. FERGUSON: Yes. 

MR. BIERSTEKER: Go ahead. 

A. (continuing) Fifteen hundred dollars 

per hour. 

Q. You have gone up since Minnesota. Was 
it Minnesota? 

A. Perhaps. 

MR. FERGUSON: Off the record. 

(Discussion off the record.) 

Q. Now, you worked. Doctor Rubin, on 
behalf of one or more tobacco companies in the 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

5 

Mississippi, Florida, Texas and Minnesota 
litigation; is that correct? 

A. Correct. 

Q. And you are retained on the Washington 
case; correct? 

A. Yes. 

Q. Are there any other cases in which you 
have been retained involving tobacco litigation? 

A. Oklahoma. And although I have not done 
any work on it yet, I believe the intention is 
to be involved in Massachusetts. And also in 
something called Northwest Laborers. And I may 
be leaving something out, but if I am, it's 
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because 

I haven't done any work on it. 

15 

Q. 

So in the Oklahoma 

case, I would 

16 

imagine 

you have looked at. 

for example, Docto: 

17 

Harrison 

's work? 


18 

A. 

Correct. 


19 

Q. 

Have you ever before done a damage 

20 

analysis 

in an antitrust case? 

21 

A. 

No. 


22 

Q. 

Have you ever done 

a damage analysis . 

23 

any type 

of litigation where 

you have actually 

24 

calculated damages? 


25 

A. 

I don't believe — 

well, I mean, I 
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guess I have to clarify what a damage analysis 
would be. I have calculated amounts of money, 
helped to calculate amounts of money in a 
litigation. But I don't know whether that's 
referred to as damages, and as it wasn't 
involving damage — again, it's a legal 
definition. That's why I'm waffling. 

It involved somebody who was suing 
an insurance company for not handling their 
policy correctly, and so they had overpaid, was 
the claim. I was involved in that, and how much 
money was involved. I don't know if that 
technically is under the concept of damage 
analysis or not, but it involved money. And it 
was because somebody was supposed — alleged not 
to have handled some claims properly. 

Q. What type of calculations did you do in 
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that case? 

A. For the — it was based on designing a 
sample survey of the records that were in 
dispute, and the survey was done to reaudit 
those records, and then from the survey, 
calculate how much was overpaid by the insurance 
company beyond the amount which they should have 
paid. 


MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

7 

Q. When you — prior to your working on 
the Washington tobacco litigation, were you told 
what your assignment was on that case? 

A. In general terms, I guess I was told 
that it was another case that was — the issues 
were similar, analogous, to the other cases that 
I have been involved in. And, "Here's the 
plaintiff's reports," and "read them," and I 
guess implicitly, "we would like you to write a 
statement and give us some advice on what's new 
in this report, and your opinions about the 
quality of the statistics in the report." 

Q. Which reports did you receive? 

A. I received the Harris report, various 

versions of it, and later I received the 
supplement, which was quite recently. I 
received other reports that were given to me, 
and I am blocking on names of the authors right 
now. 

Q. Doctor Leffler? 
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A. 

Leffler 

; that's correct. 


22 

Q. 

Doctor : 

Max? 


23 

A. 

Yes. I 

said yes slowly 

there, because 

24 

in most 

of these 

states there had 

been a Max 

25 

report, 

and so I 

think I received 

one that was 
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1 specific to Washington, but I may not have. I 

2 don't — they tend to be quite similar. 

3 Q. In addition to those three experts, 

4 were there any others specific to Washington 

5 that you reviewed, that you recall? 

6 A. I looked at some of the reports that 

7 Harris referred to in some of his footnotes. 

8 For example, he did refer to the Harrison report 

9 which I was looking at for Oklahoma, and there 

10 were a couple of others that I may have quickly 

11 looked at. I think I did quickly look at them, 

12 but I can't remember which ones they were. 

13 Q. Now, in addition to looking at Doctor 

14 Harris's reports and — let me clarify that. 

15 You've said there were several — my 

16 recollection is there was a report of Doctor 

17 Harris in about November and a subsequent report 

18 in January, with I think a supplement right 

19 after that in January, and recently there was an 

20 affidavit or declaration. 

21 A. Correct. 

22 Q. And you have looked at all of those? 

23 A. Yes, I believe so. I remember seeing 

24 the report, and I think it was the January 5 
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report, that was the major one that I read and 
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— with its appendix. And there was a very 
recent supplement. And you mentioned there was 
one in between, as well? 

Q. I think there was probably just data, 
or an attachment to the report. 

A. You are right. There was an attachment 
that was basically tables and stuff, which I 
glanced at. 

Q. Have you also reviewed the depositions 
that Doctor Harris has given in the Washington 
case? 

A. Yes, I have. 

Q. Did you review Doctor Leffler's 

deposition in the Washington case? 

A. I don't believe so. It's possible but 

— it doesn't — it's possible, but I don't 
believe so. 

Q. Do you have a specific recollection of 
looking at Doctor Max's deposition in the 
Washington case? 

A. No, I don't. Again, it's possible that 
I glanced through it, but the one I read 
carefully was Doctor Harris's. 

Q. I gathered that from your report. 

A. Yes. 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 
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1 Q. Do you recall with any specificity 

2 which reports in addition to Doctor Harrison's 

3 were cited by Doctor Harris that you went and 

4 independently looked at? 

5 A. I looked through, very quickly, the 

6 book by Manning, Newhouse et alia. I don't 

7 remember the authors. But I did look at that 

8 just to try to get a flavor for what they were 

9 doing. I did not read it. 

10 Do I remember any of the others? If 

11 I saw the footnote, I might be able to pick out 

12 others that I saw, but the names don't come to 

13 mind right now. 

14 Q. In the interest of weight, I no longer 

15 carry all those things with me, so I can't help 

16 you. Have you yourself performed any 

17 calculations or modeling with regard to the 

18 Washington damage claims? 

19 A. Not specifically with respect to 

20 Washington, no. 

21 Q. Have you done some calculations or 

22 modeling with regard to tobacco plaintiffs' 

23 damage claims generally? 

24 A. I have done some general thinking about 

25 what would be required in that modeling, and I 

MAHANEY REPORTING SERVICES 
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1 have various notes on what those analyses would 

2 have to look at. 
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Q. Now, when you say "various notes," do 
you mean something separate from your report? 

A. Yes. 

MR. FERGUSON: I'm going to presume, 

Peter, without really knowing, that those 
haven't been produced, and we would probably be 
interested in them. I will make a note here to 
send you a request. 

MR. BIERSTEKER: That's fine. 

Q. Have you asked for any additional 
material which you have not yet received 
relative to Doctor Harris's report? 

A. I don't believe so. 

Q. I'm going to turn now to your report. 

As I said, I really don't have a clue about 
statistics so you will have to bear with me as I 
try to wing through this. 

A. Sure. 

Q. But I did bring a couple of copies of 
your report, so you can have it and follow along 
with me. 

A. Thank you. 

Q. This is just some routing information 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

12 

that we actually wrote on it and forgot to erase 
it before we copied it. 

A. No problem. 

Q. So we can just sort of ignore that 


http://legacy.library.ucsfSdu/tiel/vtjiittpSaf0yO)^pdfindustrydocuments.ucsf.edu/docs/zjhd0001 



5 


first page. 


6 A. Okay. 

7 Q. On the first page there, kind of your 

8 introductory section, there are paragraphs 

9 numbered 1, 2, 3, and these are referred to by 

10 you as three broad opinions in the report. 

11 A. Correct. 

12 Q. My first question deals with the 

13 paragraph numbered 1 here. You make the 

14 statement, "Reliable and statistically valid 

15 estimates of health care expenditures incurred 

16 by the State of Washington's Medicaid program as 

17 a result of defendants' alleged wrongful conduct 

18 can be calculated." 

19 In the context of this sentence, 

20 would you define "reliable and statistically 

21 valid estimates"? 

22 A. Sure. There are two primary components 

23 to do this kind of calculation. And one of the 

24 components relies on analyses of data in the 

25 actual world, in our world. And the second 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

13 

1 component involves making explicit assumptions 

2 about what would have happened in a world 

3 without the alleged wrongful conduct. 

4 So what I mean by "reliable and 

5 statistically valid" are two things. First, 

6 doing valid analyses that are acceptable on real 

7 world data, that's the first component, ending 
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up with point estimates and standard errors and 


9 confidence intervals, the usual kind of analysis 

10 of data. 

11 And the second component means being 

12 explicit about what assumptions you are making 

13 in order to make this projection into this other 

14 world that would have existed without the 

15 alleged wrongful conduct, and having some 

16 argument, scientific, logical argument, about 

17 why those assumptions are plausible. 

18 Now, my job as a statistician isn't 

19 to argue about the assumptions — those are 

20 behavioral and medical assumptions about what 

21 would have happened otherwise, primarily 

22 behavioral — but to make sure that those 

23 assumptions as translated into mathematics and 

24 statistics are explicit enough so that we can do 

25 the calculation. 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

14 

1 Q. You said a valid analysis, analyses 

2 that are acceptable. Acceptable to whom? 

3 A. This is the first component I was 

4 talking about. 

5 Q. Yes. 

6 A. It means to the community of learned 

7 statisticians. Since these are statistical 

8 analyses, they should be considered within the 

9 context of what the statistical literature, the 
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10 current statistical — not the old literature 

11 going back 50 years, but what the current 

12 state-of-the-art statistical literature and 

13 experts think is acceptable. These kinds of 

14 calculations, the amounts we are talking about, 

15 shouldn't be dealt with using classroom 

16 exercise, the first-course-in-statistics 

17 methods. They're important questions that 

18 involve vast sums of money. 

19 And so the analyses that should be 

20 used should be state of the art and serious and 

21 evaluated in that context. 

22 Q. When you use the term "state of the 

23 art," do you have in mind a particular type of 

24 statistical analysis? 

25 A. Not necessarily. There are various 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 
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1 ways of going at problems, but the analyses, 

2 however they are done, should be aware of 

3 complicating issues there that can make some 

4 analyses vulnerable to errors. 

5 Q. For example, if we were talking about a 

6 problem of missing data, am I correct that you 

7 would characterize multiple imputation as state 

8 of the art? 

9 A. That's one. 

10 Q. But there are others? 

11 A. There are other things that are 

12 competitive, depending upon the kind of data 
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13 structure that's there and the use that you are 

14 going to make of the data, but that's one way to 

15 go at it. There are other kinds of modeling 

16 approaches based on likelihood theory that can 

17 also handle the data properly. This is in 

18 contrast to filling in the best predictive value 

19 or using a hot-deck technique or other things 

20 that have been around for half a century and 

21 were convenient when people couldn't compute 

22 anything except by hand. 

23 Q. And those, are those state of the art? 

24 A. No. 

25 Q. So it's your opinion that the failure 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

16 

1 to use state-of-the-art methods, for example, 

2 for imputation of missing data, would render the 

3 work invalid? 

4 A. Invalid and unreliable. We don't know 

5 whether it changes answers. We don't know to 

6 what extent it would change point estimates. It 

7 should certainly render estimates of uncertainty 

8 invalid, standard errors, confidence intervals, 

9 and generally make point estimates wrong as 

10 well. 

11 Q. When you use the term "invalid and 

12 unreliable," are they different? Or are you 

13 using them interchangeably? 

14 A. They mean something different, but 
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15 

there' s 

an overlap. If I try to be explicit. 

16 

what I mean by "valid" usually— 

17 


MR. FERGUSON: Let's take a quick 

18 

break. 


19 



20 


(Discussion off the record.) 

21 



22 

Q. 

I had asked whether you were using 

23 

"invalid 

" and "unreliable" interchangeably. 

24 

A. 

Correct. "Invalid" to me has a 

25 

slightly 

more technical meaning. When I use it 


MAHANEY REPORTING SERVICES 
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1 I tend to mean with respect to frequency 

2 inference in terms of standard errors and 

3 confidence intervals and testing hypotheses, so 

4 the traditional statistical meaning of invalid 

5 in the world of frequency inference. 

6 There's another version which could 

7 be called likelihood valid, another called 

8 Bayesian valid, but traditional statistics 

9 refers to valid in terms of a frequency 

10 inference. Reliable, I guess to me it's a 

11 broader term. And I'm not sure whether I have a 

12 technical statistical meaning for that or not. 

13 The way I'm using it, I think, is, can you rely 

14 on answers as something that you would go to bat 

15 on, in the colloquial way. 

16 Q. I would like to understand when you use 

17 it in a colloquial way, with a little more 
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18 detail. Do you mean that your — to be 

19 reliable, in your standards, that you would have 

20 to be a hundred percent certain that something 

21 was reliable? 

22 A. No, no. 

23 Q. How certain would you need to be? 

24 A. I don't think I can answer that 

25 question, how certain — the kinds of percents 

MAHANEY REPORTING SERVICES 
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1 that come from like 95 percent confidence and 95 

2 percent significance level, those things have 

3 strict meanings within the world of statistics, 

4 and that's where the "valid" comes in. If you 

5 are claiming something is a 95 percent 

6 confidence interval, I can define that, or 90 

7 percent, or 50 percent confidence. I can define 

8 it. And if the procedure you are using, if you 

9 are claiming it's a 95 percent interval and it's 

10 not, it's only a 20 percent, then that's 

11 invalid. Now, conclusions based on that would 

12 also be unreliable. 

13 Q. Okay. Understood. I am still 

14 struggling a little bit with "unreliable." If 

15 you felt more probably than not that an analysis 

16 was correct, would you characterize that as 

17 reliable, in your use of the term? 

18 A. No, I don't think that's the kind of 

19 meaning I want to attach to it. It's more 
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20 having to do with the second component I talked 

21 about. If you're clearly explicating 

22 assumptions that you have to make in order to do 

23 an analysis, and those assumptions have no real 

24 data attached to them, so you can't put 

25 confidence intervals or standard errors on them, 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

19 

1 but if you stated them clearly, and then you 

2 have confidence intervals and standard errors, 

3 and so far as — attached to the numbers that 

4 come from the data, I would say under those 

5 stated assumptions the conclusions are reliable, 

6 given those stated assumptions. 

7 I don't know whether the stated 

8 assumptions are correct. That's for someone 

9 else. If they're assumptions about medical 

10 things, I'm not an expert in those things, or 

11 about behavioral things. I'm not an expert in 

12 that. I would say, predicated on those 

13 assumptions, here's a reliable statement. 

14 But if someone just said, "Let me 

15 just assume that the amount that's owed by the 

16 tobacco companies is 1.3257 million dollars — 

17 billion dollars — and I got that number because 

18 I processed it internally and I came up with 

19 that number," I would say that's unreliable, 

20 because I don't understand the basis for it, 

21 what to put down. 

22 Q. So in other words, you want to be able 
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23 to test or evaluate each assumption? 

24 A. I want the assumption stated. Even if 

25 I'm not the right person to evaluate the 

MAHANEY REPORTING SERVICES 
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1 assumption, I want them stated there. 

2 Q. What's your understanding of 

3 Washington's claim of misconduct by the tobacco 

4 companies? 

5 A. Well, my understanding, I guess, only 

6 comes, I believe, from reading Harris's report, 

7 Doctor Harris's report. And so I believe the 

8 claims of misconduct have to do with the kind of 

9 things that he indicates in his report as 

10 alleged misconduct. 

11 Q. You understand Washington's claim to be 

12 that the tobacco industry engaged in a 

13 conspiracy to restrain competition? 

14 A. Yes, I understand that from his report. 

15 Q. In the paragraph numbered 2 on the 

16 first page of your report, you state, "The 

17 analysis of Doctor Jeffrey Harris, one of the 

18 State's experts, essentially has none of the 

19 characteristics of such an analysis." 

20 Is there a reason you used the word 

21 "essentially" to qualify that statement? 

22 A. Yes. Because it does have some 

23 aspects. 

24 Q. Are you able to articulate those 
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aspects for me? 
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A. Well, one aspect that's good to see is 
that there's an attempt to assess the effect of 
the alleged wrongful conducts on behavioral 
changes in people. And to acknowledge that the 
— that absent the alleged misconducts, that 
smoking would not vanish from the face of the 
earth, which, no one thinks, I don't believe, 
that smoking would vanish if these acts did not 
exist. But it would be presumably reduced. 

And so I think Harris's attempt to 
address that issue is good, and it's one of the 
aspects that I argue is needed. If you are 
going to address alleged misconduct, you have to 
face the fact that the alleged acts may affect 
some people, may not affect other people, may 
affect some people to stop smoking somehow. 

They may lessen smoking, but they may still 
enjoy — other people may still continue. So he 
makes an attempt to acknowledge that, and he 

makes an attempt to bring it into his model, 

which is an essential feature of the right way 
to do that. 

Q. That's an aspect that wasn't present in 

the other cases that you looked at? 

A. Correct. 

MAHANEY REPORTING SERVICES 
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1 Q. Are there other aspects of Doctor 

2 Harris's analysis that would be aspects of a 

3 proper analysis? 

4 A. Well, the attempt to be somewhat 

5 explicit about relative risks of different kinds 

6 of smoking behaviors, the attempt to be explicit 

7 about the prevalence in this world, and in this 

8 world without the alleged acts of misconduct, 

9 prevalence of different kinds of smoking 

10 behaviors. Although not at the right level of 

11 detail, but an attempt to do that. An attempt 

12 to bring in aspects of time. He has prevalence, 

13 for example, changing in time. 

14 But there are many other aspects 

15 that are essential for what I would consider to 

16 be a reliable and statistically valid estimate 

17 that are completely absent. 

18 Q. We may get back to those, but why don't 

19 you, since you have just mentioned it, why don't 

20 you describe those that are missing? 

21 A. I would probably have to look at the 

22 report to give examples. But without doing 

23 that, off the top of my head, one thing he 

24 doesn't do, he doesn't do any conditioning on 

25 potential confounding variables, background 
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1 characteristics and other health-related 

2 behaviors. He just ignores that in both parts 

3 of his model, both the medical part and the 

4 behavioral part of the model. 

5 When he summarizes data that have 

6 been analyzed, he treats sort of confidence 

7 intervals and standard errors as if they're 

8 Bayesian. They are not Bayesian. They are 

9 something that he has sort of — intuitive 

10 things. They are not even — I don't believe he 

11 even attempts at being statistically valid in 

12 the way I described it earlier. There is not a 

13 justification of it within the standard meaning 

14 in statistics. Especially the report that I 

15 read — this document is about the report. So 

16 there are supplemental documents. 

17 And the particular assumptions that 

18 are being made in the report were very sort of 

19 loosely defined, "I posit this," "I posit that," 

20 without much justification at all. In fact, the 

21 things that are being posited are not the right 

22 components to do the correct analysis later, in 

23 any case. 

24 Q. We'll get back to this as we go 

25 through. 
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1 A. Those are the ones that come to mind, 

2 without carefully looking at my document. 

3 Q. You mentioned that he hadn't done any 
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4 conditioning for confounding background 

5 variables? 

6 A. Correct. 

7 Q. And you said "in both models." The 

8 fact that he constructed two models was, in your 

9 opinion, one of the proper things that he did? 

10 A. The attempt to consider both a medical 

11 model and a behavioral model was proper, 

12 although the way he bundled them together 

13 doesn't really work, because he gets a 

14 behavioral model that's part of the relative 

15 risks, part of the medical model, and that's not 

16 the right way to do the calculations. 

17 Q. Perhaps you could explain how he blends 

18 them together and contrast them with the proper 

19 way to do it. 

20 A. He has his relative risks changing in 

21 time as a function of behavioral changes. And 

22 then he has prevalence as just — a 

23 smoking-nonsmoking. And so the relative risk 

24 part of the calculation has behavioral changes 

25 bundled into it. And formally what you — the 
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1 way to do it correctly is to keep the relative 

2 risk part to be medical, for different kinds of 

3 exposures to smoke, different kinds of smoking 

4 behaviors, and then have another model with the 

5 prevalence of those different kinds of smoking 
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behaviors, and he doesn't do that. 


7 Q. Now, you said, "another model for the 

8 prevalence of those smoking behaviors." That's 

9 the model that you referred to at the bottom of 

10 page 1 for the effect of the alleged misconduct 

11 on the subsequent smoking behavior of 

12 individuals? 

13 A. Correct. 

14 Q. This sentence continues, "who were or 

15 would have been recipients of Medicaid in 

16 Washington"; correct? 

17 A. Yes. 

18 Q. Is that "recipients" or "those eligible 

19 for Medicaid," or does it make a difference? 

20 A. Well, if you included those who were 

21 eligible or would have been eligible, that would 

22 be fine, as well, because if they didn't receive 

23 any Medicaid expenses, then their values would 

24 be zero. So if you did the calculation 

25 properly, you could have worded this sentence 
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1 "to be eligible for" or "eligible for," and the 

2 proper analysis of that, of that defined 

3 population, would have given you the same 

4 answers. 

5 Q. So it doesn't make a difference as to 

6 whether they are recipients or eligible? 

7 A. As long as you are doing the correct 

8 analysis; that's correct. 
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9 


Q. 


Continuing on page 2, you have also 


10 referenced this — you begin the numbered 

11 paragraphs there with paragraph 1. And here you 

12 are describing how you would — I don't want to 

13 put words in your mouth. 

14 Are you describing how you would 

15 construct a damage model for Washington's case? 

16 A. Yes, they have to compare like with 

17 like. And I believe that's the way almost 

18 everybody who has gone after these issues has 

19 done it. 

20 Q. And the way you would do it is have a 

21 behavioral model which would evaluate the 

22 prevalence? 

23 A. Correct. 

24 Q. And a medical model that would evaluate 

25 the risk? 
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1 A. Basically relative risks of different 

2 kinds of smoking behaviors; right. 

3 Q. Going back to this paragraph number 1 

4 on page 2, "important background variables," I 

5 think you have defined these later in the text. 

6 But here we are talking about demographic 

7 considerations? 

8 A. Demographic characteristics and other 

9 things, other kinds of health-related behaviors 

10 that, for example, in various of the other 
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11 lawsuits, were tried to be captured by 

12 risk-taking behavior as indicated by whether you 

13 were wearing seat belts or not. Perhaps 

14 depression, perhaps alcohol consumption, other 

15 things like that. 

16 Q. I think rather than go through these 

17 specific points, we'll come back to them as we 

18 go through the course — because you have 

19 expounded on each of these in the course of the 

20 report. 

21 A. That's fine. 

22 Q. Just turn to page 3 of the report, 

23 please. When you are describing here, using an 

24 example, how you would evaluate a claim premised 

25 on the failure to disseminate information about 
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1 health effects of cigarette smoking in 1965 — 

2 A. I want to make sure I'm on the same 

3 page. There are no pages on my document, and I 

4 think they are one minus what you have got. I 

5 have no page numbers at the bottom. 

6 Okay. So I will just add one to 

7 yours when you give me a page number. 

8 Q. I'll try to remember that. In the 

9 paragraph that follows the indented paragraphs 
10 

11 A. Right. 

12 Q. — you make this statement: "This kind 

13 of model would be based, in part, on data 
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14 concerning the effect on smoking cessation and 

15 smoking initiation of the availability of the 

16 additional information about smoking's health 

17 risks." What data sets are you aware of 

18 concerning the effect on smoking cessation and 

19 initiation of availability of additional 

20 information? 

21 A. There are some data sets I think that 

22 were — that I referred to in my statement that 

23 I wrote for Minnesota that just had to do with 

24 sales of cigarettes following different kinds of 

25 information that was provided. I don't have 
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1 those in mind at all right now. I do think that 

2 — and these aren't data sets. If this is not 

3 responsive, just shut me off. But there are 

4 people who do study the effect of information on 

5 behavioral changes in health generally, and how 

6 people stop doing things they know are bad for 

7 them when they find out they are bad for them. 

8 People with high blood pressure, 

9 drinking coffee or caffeine-driven, or people 

10 who eat, continue to eat fatty food despite high 

11 cholesterol when they have been told about 

12 that. So there are people, I have been told by 

13 colleagues at Harvard, there are people who 

14 study such things. There may even be parts of 

15 departments in medical schools that study such 
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Doctor Harris made an effort to examine 


16 

17 

18 

19 

20 
21 
22 

23 

24 

25 
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7 

8 
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10 

11 

12 

13 

14 

15 

16 

17 

18 


things. 

Q. 

studies of changes in quit rates and smoking 
initiation rates following various information, 
did he not? 

A. Yes, he did. 

Q. Are you critical of his analysis to the 
extent that he did that? 

A. That was the first step. I think 
that's good. But what I would like to have seen 
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as a statistician is, since he is basing his 
assumptions on data, I would like to see some 
analysis of the data, where the real data sets 
were. So analyses with the usual kinds of 
statistical analysis with point estimates, 
confidence intervals and standard errors. 
Instead, what we somehow — it was sort of 
filtered through his mind, and an answer comes 
out. 

And this gets back to the validity 
and reliability thing. He is obviously a 
storehouse of information on these things, but 
it's not being produced in a way that's 
statistically valid or reliable, because I don't 
know what's gone on, that he takes that 
information and produces things, "based on this 
I now posit" something. If it's based on data, 
then it can't have standard errors and 
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19 confidence intervals. 

20 If it's based on a data set, what 

21 about problems of relatives to the population? 

22 Issues like that should be addressed. If it's 

23 based on additional assumptions beyond the data 

24 set, those can be explicated, so I can know 

25 what's going on. 
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1 Q. If the studies upon which he based that 

2 portion of his opinion explicated their 

3 assumptions in each study, and if the 

4 statistical steps you just described had been 

5 done as part of that study, would that be 

6 satisfactory? 

7 A. Yes. As well as going on — and if 

8 there are other assumptions that Doctor Harris 

9 has to make to reach his conclusions about 

10 behaviors in this counterfactual world, he has 

11 to explicate those assumptions. 

12 Q. Have you examined the underlying 

13 studies with respect to his opinion on the 

14 effects of smoking to see if they expressed 

15 individually the assumptions that are made as to 

16 each? 

17 A. No, I have not. 

18 Q. Similarly, you haven't looked to see if 

19 they have gone through the statistical steps? 

20 A. No, because I don't — that would be an 


http://legacy.library.ucsfSdu/tiel/vtjiittpSaf0yO)^pdfindustrydocuments.ucsf.edu/docs/zjhd0001 



21 enormous effort to try to figure out from the 

22 Harris report which parts of the studies were 

23 what he's relying on. It's much easier to 

24 evaluate something that someone says, "This is 

25 what I did," than to say, "Here's this mass of 
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1 literature I read over many years and I now 

2 process it internally and come up with these 

3 answers." It's a very difficult task to try to 

4 unwind someone else's process. 

5 Q. Do you know Doctor Harris? 

6 A. No, I don't. I would like to meet him 

7 sometime. 

8 Q. The next sentence in that paragraph 

9 that we are looking at states, "Such a model of 

10 smoking behavior would clearly have to consider 

11 background characteristics of individuals who 

12 were or would have been recipients of 

13 State-funded health care; adjust for confounding 

14 factors that alter the effect of the alleged 

15 misconduct on smoking behavior; and consider how 

16 changes in smoking behavior in the pertinent 

17 population would change from 1965 through the 

18 end of the damage period in 2001." 

19 With regard to the first two clauses 

20 of that sentence, background characteristics and 

21 confounding factors are different; correct? 

22 A. Correct. Although I'm not sure how 

23 perfectly consistent I have been with that 
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24 distinction here. What I intend to mean by 

25 "background characteristics" are demographic 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

33 

1 characteristics that wouldn't change in this 

2 counterfactual world, and by "other confounding 

3 factors," I intend to mean other things that 

4 might change in a counterfactual world without 

5 the alleged misconduct, so diet, the use of 

6 nicotine patches, things like that. 

7 

8 (Discussion off the record.) 

9 

10 Q. Background characteristics, you just 

11 said, would be demographic things that wouldn't 

12 change from the factual to the counterfactual 

13 world? 

14 A. Right. 

15 Q. What demographic considerations would 

16 be relevant in the damage modeling in this case? 

17 A. There would be a very long list that I 

18 would try to cull from looking at all the 

19 various plaintiffs' reports in all the various 

20 states. So off the top of my head, things like 

21 gender and date of birth and race, perhaps 

22 parents' education, some measure of the 

23 socioeconomic status, when they were growing up 

24 maybe whether parents smoked, things like that. 

25 I hope I have been consistent with 
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the background characteristics and other 
confounding factors in this report. 

Q. I think you have. 

A. I think I have. I was trying to, but 

it's possible that there's one place where I 
would just say "confounding factors" and maybe I 
meant that to include background characteristics 
as well. 

Q. I'm not trying to trap you here. I'm 
just trying to figure this out. In fact, in the 
paragraph below that, you will see several of 
the ones you mentioned are included there: year 
of birth, gender, race, income level, education, 
baseline mental and physical health. 

A. Yes. 

Q. Are there data sets that include all of 
these items? 

A. Well, let's see. I think — 

Q. You said all of them? 

A. Other data sets include all of them. 

Q. Is it one data set that — is it a 

combination of data sets that would include all 
of these factors? 

A. I believe there is, if — the 
combinations of them. NMES includes, and I'm 
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1 trying to think back to these other interviews 

2 they are piecing together, the Tobacco Use 

3 Survey and Behavioral Risk Factors Survey, and I 

4 think there are many of these aspects; if you 

5 put all the data sets together, there are many 

6 of those characteristics. But those aren't 

7 fresh in my mind right now because those are 

8 from months ago. 

9 Q. In your review of Doctor Harris's 

10 report, did he make any effort to account for 

11 any demographic characteristics? 

12 A. Well, not in his analyses in any formal 

13 way. He does report sex differences, male 

14 versus female gender differences, and how 

15 smoking initiation rates and quit rates were 

16 different, and tries to do something for them, 

17 but not in a formal way. 

18 Q. In some fashion he accounts for income 

19 levels, because he is using low income as a 

20 proxy for Medicaid; right? Do you remember that 

21 part? 

22 A. Right. So he is trying to get towards 

23 the relevant population, but I don't believe — 

24 for the risk ratios, I think most of those 

25 studies were just with the general population. 
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1 Q. Did you review the underlying studies 
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2 on which he based his conclusions about risk 

3 ratios? 

4 A. No, I did not. I may have reviewed 

5 some of them previously in a general sort of 

6 way. One of them he relies on is Harrison, so I 

7 guess I did review that carefully. And some of 

8 the other ones that appeared in other states 

9 which he may be relying on, I have reviewed, the 

10 ones in states that we mentioned at the 

11 beginning. 

12 Q. Do NMES or TUS or the BRFSS data sets, 

13 do any of those include information on baseline 

14 mental and physical health? I don't know, so — 

15 A. I think some of them include variables 

16 on depression, I believe. I don't remember 

17 which — which ones now. And the — I mean the 

18 attempt to call this baseline — I'm sorry. The 

19 reason I called it "baseline mental and physical 

20 health" in this sentence is because I was 

21 referring to background characteristics, so to 

22 try to convey the idea that ideally these 

23 measures of mental and physical health would be 

24 ones that could be not be affected by — would 

25 not change in the counterfactual world. 
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1 That's what the baseline is supposed 

2 to refer to. And I believe that this measure of 

3 depression, in whichever data set it was, it was 

4 one of the ones that would be arguably an 
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5 outcome and could be affected by a world without 

6 the alleged misconduct. 

7 Q. You are not suggesting, are you, that 

8 any of these background characteristics you have 

9 described would produce a different result in a 

10 counterfactual world? You are just suggesting 

11 that that possibility should be considered? 

12 MR. BIERSTEKER: I object to the 

13 f o rm. 

14 A. Okay. I'm saying that they should be 

15 considered, and in the analyses that I have seen 

16 from other states, there is a long list of 

17 variables that were considered. And to some 

18 extent, the authors of those reports thought it 

19 was very important to include those variables, 

20 and perhaps more variables that they were not 

21 able to include. 

22 Now, whether including them and 

23 doing a correct analysis using them would change 

24 point estimates, I don't know, because I haven't 

25 done those analyses. 
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1 Q. In reference to other states' reports 

2 are you talking basically about the reports of 

3 the Millers? 

4 A. The Millers, the Zeger report in 

5 Minnesota, the Harrison one for Oklahoma. 

6 Q. Did each of those attempt to account 
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7 for these background variables with a regression 

8 analysis? 

9 A. I believe they all tried to account for 

10 the set of background variables and confounding 

11 factors using some form of regression analysis, 

12 yes. 

13 Q. To the extent that Doctor Harris relied 

14 on the work of Eve Leonard or Vincent Miller or 

15 Doctor Zeger or Doctor Harrison, would it be 

16 correct that to some extent he has accounted for 

17 background variables, at least to the extent 

18 that each of their works did? 

19 A. Not really. 

20 Q. Could you explain that? 

21 A. Yes. There are a couple of reasons. 

22 First, all the reports that I have read don't 

23 validly or reliably control for the background 

24 variables that they are claiming to control 

25 for. 
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1 The second reason is even if they 

2 did, and so therefore they had relative risks as 

3 a function of all these background 

4 characteristics and confounding factors, so they 

5 had them as a function of — those relative 

6 risks as a function of those background 

7 characteristics would have to be used by Harris 

8 in his analysis, and not just averaged over all 

9 those background characteristics, which is what 
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10 he does. You can't sort of take — the weighted 

11 average of two things does not equal the average 

12 of one times the average of the other. And 

13 that's sort of what Harris does in his 

14 analysis. 

15 So even if, to be clear, even if 

16 those analyses that he was relying on were all 

17 perfect, that Harris doesn't use the information 

18 from them in a way that will lead to valid and 

19 reliable answers. 

20 Q. In this sentence that refers to 

21 examples of background characteristics, you end 

22 the list of examples by saying, "and many other 

23 factors, such as those that determine 

24 eligibility for the State's Medicaid program"? 

25 A. Correct. 
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1 Q. How many other factors? Do you have 

2 any idea? 

3 A. Well, I'm not an expert on deciding 

4 what those factors should be. As a statistician 

5 I am an expert on saying they can be as big as 

6 you can possibly get, and I will tell you how to 

7 do the analysis for them, how to make the 

8 adjustments and control for those. At this 

9 point I think the reasonable thing to do to talk 

10 about these other factors is minimally the 

11 complete collection of all the factors that the 
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12 plaintiffs in all the states think are 

13 important. 

14 So if you look in each state, they 

15 have claims as to which factors they think 

16 should be included. And in some cases, things 

17 that, I think, plaintiffs' experts thought 

18 should be included were not. I believe Samet 

19 had some suggestions in Minnesota that in fact 

20 were not in the model. 

21 But I believe that if you looked at 

22 all the states and all the plaintiffs and looked 

23 at the collection of characteristics that they 

24 think should be controlled for, that all those 

25 should be considered and presumably controlled 
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1 for. 

2 Q. What type of expertise. Doctor Rubin, 

3 would be best utilized to determine which 

4 characteristics should be controlled for, other 

5 than plaintiffs' lawyers? 

6 A. Obviously medical people who understand 

7 what kinds of characteristics affect disease. 

8 Q. Do you mean epidemiologists? 

9 A. Medical doctors and epidemiologists and 

10 health economists, because we are talking about 

11 costs in dollars, not just, are you diseased or 

12 not. So which things affect disease as a 

13 function of different kinds of people's 

14 background characteristics. These people with 


http://legacy.library.ucsfSdu/tiel/vtjiittpSaf0yO)^pdfindustrydocuments.ucsf.edu/docs/zjhd0001 



15 different characteristics would spend — would 

16 have different costs. So that would — and 

17 statisticians as well, because I think they can 

18 be helpful about suggesting, through a general 

19 kind of knowledge, what kinds of variables might 

20 have been left out or suggest the need to 

21 consider interactions between variables or 

22 transformations of variables. 

23 Q. I presume you wouldn't need a 

24 specialist in Medicaid? One of these experts 

25 could simply find out what the Medicaid 
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1 eligibility requirements were? 

2 A. It wouldn't hurt to have — to know 

3 what was happening in Medicaid, especially 

4 because, as I understand it, Medicaid 

5 requirements vary from state to state. It might 

6 be useful to have a person who knew Medicaid in 

7 that state. 

8 Q. The paragraph continues, "Ideally, 

9 these background characteristics would be 

10 measured on each individual before the date of 

11 the alleged misconduct or, equivalently, before 

12 the moment in time when the alleged misconduct 

13 had an effect on that characteristic for that 

14 individual." 

15 And your use of the word "ideally" 

16 there suggests that this isn't possible; is that 
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Correct. 


I mean, the "ideally" means 


17 right? 

18 A. 

19 something like this. Baseline mental health 

20 would be ideally one that could not possibly be 

21 affected by the alleged misconduct, so it would 

22 have the same value in this world, actual world, 

23 as in the counterfactual world. But that's — 

24 it's a false hope that all the characteristics 

25 that we have to control for would be available 
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1 in a data set in that form. They would be prior 

2 to it. And the fact that this doesn't exist 

3 means that the correct analyses are somewhat 

4 more complicated and have to rely on some other 

5 assumptions. 

6 Q. What are the statistically valid 

7 methods for considering background 

8 characteristics? Can I include confounding 


9 

factors 

with that? 


10 

A. 

Sure. 


11 

Q. 

And the confounding factors? 


12 

A. 

In general, you have to build a 

model 

13 

for health care — let's talk about the ! 

health 

14 

care model, not the behavioral model. 


15 

Q. 

Sure. 


16 

A. 

The health care model, you have 

to 

17 

build a 

model where you have health care 


18 

expenditures, Medicaid expenditures, for 


19 

example. 

as a function of background 
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20 characteristics, these other confounding factors 

21 and smoking behavior. And where smoking 

22 behaviors are — there are various kinds of 

23 smoking behaviors that are considered relevant 

24 to different kinds of health care costs. 

25 Now, when you build a model like 
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1 that, the most typical kind of model is a linear 

2 regression model or some variant of that, that 

3 involves transformations like logs or some type 

4 of two-part model, the probability of having 

5 positive amounts and another part for the amount 

6 given that it's positive, those kinds of models, 

7 since the point of those kinds of models is to 

8 compare nonsmokers with smokers of different 

9 types, because where relative risk comes from 

10 cost, you have to be very careful about the 

11 difference in background characteristics and 

12 other confounding factors between smokers and 

13 nonsmokers. 

14 If you simply do a linear regression 

15 without carefully checking to see what the 

16 differences in background characteristics are 

17 between smokers and nonsmokers, the results are 

18 known to be unreliable and generally invalid. 

19 Q. Known by whom? 

20 A. Known in the statistical literature, 

21 that if you rely on linear regression modeling 
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22 in these situations, do an adjustment between 

23 smokers and nonsmokers, let's say, and the two 

24 groups differ on background characteristics, 

25 it's been known for a long time, probably for — 
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1 I don't know. I could probably find examples 

2 going back a quarter century, but certainly in 

3 things that I have published with William 

4 Corcoran, as described in the Minnesota report 

5 that I wrote, published, I think in 1973, so I 

6 guess it is a quarter century now — that it's 

7 unreliable to do adjustments that way. 

8 Q. How would you solve this problem? 

9 A. I would address it by — suppose we 

10 have the data set available, we have some 

11 collection of background characteristics and 

12 different smoking behaviors. I would first — 

13 and this has been a plan of attack that has been 

14 pretty successful in some problems that is, I 

15 guess, gaining some popularity — do a 

16 propensity scoring analysis to find out how 

17 smokers and nonsmokers differ with respect to 

18 these background characteristics, to see how far 

19 apart these two groups are. And if they overlap 

20 a lot on the background characteristics, then 

21 the standard linear models are pretty reliable, 

22 if the groups really overlap a lot, of smokers 

23 and nonsmokers. 

24 In the data sets we see, primarily 
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25 


NMES, they don't overlap a lot. Smokers are 
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1 just different from nonsmokers even in the 

2 Medicaid subgroup, which, according to the 

3 guidelines in the literature over the last 

4 quarter century, suggest that the answers that 

5 are produced from doing linear modeling are 

6 unreliable, untrustworthy, invalid. 

7 So I would take a data set like that 

8 and I would attempt to do some either matching 

9 or subclassification, depending upon the 

10 relative density of smokers and nonsmokers, and 

11 use that to fit more complex models which are 

12 within each of the subclasses or unmatched pairs 

13 to get more — to get estimates that are valid 

14 and reliable. 

15 You, when you do that, you will 

16 find, because there wasn't extrapolation 

17 involved, that standard errors will go up 

18 because you are being honest about the source of 

19 uncertainty. Not a straight line saying "the 

20 world is always going to be a straight line" to 

21 the end of the room or fitting a curve. If they 

22 fit a curve from these little three points here, 

23 I know the curve is going out and — those 

24 models are known to be unreliable reliable 

25 unless the groups overlap. 
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1 These propensity scoring analyses up 

2 front allow you to compare groups of smokers and 

3 nonsmokers who have similar background 

4 characteristics. So the linear model part is 

5 much more trustworthy because it's comparing 

6 people who are more similar. 

7 Q. If the propensity scoring shows they 

8 are not similar, you break it down into more 

9 complex submodeling? 

10 A. A separate model for people who are 

11 similar to each other. I would do a more 

12 complex modeling. I would use, for example, the 

13 standard models, but within subgroups of smokers 

14 and nonsmokers who are similar on background 

15 characteristics. 

16 Q. For that subgroup, you would use a 

17 regression analysis? 

18 A. But I would be more careful than the 

19 authors of these reports I have seen so that 

20 within that subgroup the model looks like it's 

21 trustworthy. 

22 Q. You started by assuming that the data 

23 set was available? 

24 A. Correct. 

25 Q. And I think you earlier said that at 
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1 least a substantial portion of data is available 

2 in some combination of sources? 

3 A. Correct. What I was saying is the list 

4 of factors that should be included, background 

5 and other confounding factors, I would begin 

6 with the list of all the variables, all the 

7 factors that the plaintiffs have brought 

8 forward. And if they are bringing them forward 

9 and doing analysis on them, then they must 

10 exist. So I'm just saying they exist 

11 somewhere. 

12 And for example, in Oklahoma, 

13 Harrison does an analysis with — in the 30's or 

14 40's, I don't know — 30 some background 

15 characteristics and confounding factors. And I 

16 don't know, I don't remember looking carefully 

17 to see which ones he has left out that other 

18 plaintiffs have included, but it's a fairly big 

19 set that Harrison includes in NMES alone. 

20 Q. Have you done any propensity scoring 

21 with the NMES classifications? 

22 A. Yes, there were some that I produced 

23 for Minnesota in my — I guess it was a 

24 supplemental report that I did. There are 

25 explicit propensity analyses in Minnesota, and 
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1 there were some done for Oklahoma. 

2 Q. Most of the stuff in Minnesota is under 
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3 seal and as a consequence we didn't get it. 

4 A. That's not my fault. 

5 

6 (Discussion off the record.) 

7 

8 A. (continuing) There are some analyses 

9 for Oklahoma as well, and so if Oklahoma is a 

10 participant, maybe you can find it there. 

11 Q. Can you tell me generally, as far as 

12 you can recall, in what characteristics you 

13 found smokers to be different from nonsmokers in 

14 the NMES data? 

15 A. Let me think. I looked at that fairly 

16 carefully in Oklahoma, I believe. I really 

17 don't remember. I apologize. I wish I did. 

18 Q. That's all right. 

19 A. I could make guesses, but I don't think 

20 that's very useful right now. 

21 Q. No. I'll track down the Oklahoma 

22 data. 

23 A. But there were fairly substantial 

24 differences between the smokers and the 

25 nonsmoking group in the Medicaid, NMES data set 
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1 of these background characteristics and 

2 confounding factors, just ones that Harrison 

3 used. I believe what the analyses I did were — 

4 I think I just took the characteristics that 

5 Harrison wanted to adjust for in his linear 
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regression model and did the propensity scoring 


7 analysis to see how far apart the smokers and 

8 nonsmokers were. And I did different 

9 definitions of smokers and nonsmokers, the 

10 ever-never, current-not current, and there was 

11 one other one — and there was one subgroup of 

12 one of those, like the never versus current, 

13 leaving out the former. And so some are quite 

14 different from each other. 

15 Q. To do the modeling as you suggested 

16 should be done is not going to require a Cray's 

17 supercomputer, is it? 

18 A. No. 

19 Q. It's going to require a fair amount of 

20 time? 

21 A. But considering the amount of time 

22 that's been put into these things so far, not 

23 substantial. 

24 Q. Tell me how many person hours, as best 

25 you can pencil it out, would be required to do 
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1 the analysis that you think would be necessary 

2 in this case. 

3 A. Well, can we take, for example, 

4 Harrison's analysis as one? 

5 Q. Sure. 

6 A. Because that's one I have most recently 

7 looked at. Some number of times longer than 
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Harrison took to do his. I don't know how long 


9 he took to do his. 

10 Q. That's not terribly helpful. 

11 A. But having — if you looked at what his 

12 analysis is, I'm saying you really have to do 

13 that, and maybe take five times as much as he 

14 spent doing his analysis. How long would it 

15 take? It would obviously be different kinds of 

16 people involved. I would not do it. It would 

17 be outrageous for me to do it. You would want a 

18 team where there was some direction from me, 

19 some direction from another senior statistician, 

20 and some more junior people to do some of the 

21 more mundane computing things, so it would stay 

22 under control. 

23 I don't know. A couple of months? 

24 I think you would have a good analysis that I 

25 would believe was valid, in the sense of I could 
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1 trust the standard errors and confidence 

2 intervals coming out of it. 

3 Q. Now, you have written that we shouldn't 

4 waste a major portion of our resources fixing up 

5 a relatively minor problem. None of these 

6 problems you would characterize as minor? 

7 MR. BIERSTEKER: Objection to the 

8 f o rm. 

9 A. That quotation I think is in the 

10 context of missing data generally. 
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11 Q. I believe that's correct. 

12 A. It was in the context where data sets 

13 have five, ten, maybe 20 percent missing values, 

14 and that it's a minor problem. Or depending 

15 upon the kind of analysis, even two percent 

16 missing data can be a major problem, as I have 

17 also said. But with respect to missing data 

18 here, it's a major problem because NMES, for 

19 example, which is the major data set, I believe, 

20 underlying most of these states' analyses, the 

21 critical health care information, half of it is 

22 missing, so that's not a minor problem. It's 

23 like having half the data, people just have 

24 filled in something they got from some 50-year- 

25 old method and went on and pretended the problem 
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1 didn't exist. 

2 Sol would not say that's a minor 

3 problem. But in general, you don't want to 

4 spend huge resources worrying about a minor 

5 problem. 

6 Let me go back. I'm glad you 

7 brought that up. My "one or two months" were 

8 assuming that the NMES data base was in good 

9 shape, and it's not, because of missing data. 

10 So there would first have to be an effort to fix 

11 up the missing data part of the problem. And 

12 for that, I think this is an example where 


http://legacy.library.ucsfSdu/tiel/vtjiittpSaf0yO)^pdfindustrydocuments.ucsf.edu/docs/zjhd0001 



13 multiple imputation would work well as a 

14 solution, because the kind of analysis you want 

15 to do later is flexible analysis. 

16 You want to be able to do more than 

17 one analysis on the data set. You want to have 

18 a data set available for other people to use, 

19 for plaintiffs, defendants, anyone else who 

20 wants to use it, to use, and you would like to 

21 fix up the missing data problem with these 

22 multiple imputations and let anybody do an 

23 analysis that they wanted to do. 

24 But if you do this with a likelihood 

25 or Bayesian modeling version of fixing it, you 
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1 have — it's tied up into one person as a 

2 one-analysis model. And so each analyst would 

3 have to worry about how to handle the missing 

4 data problem, each team would have to worry 

5 about itself, and then people would argue about 

6 how the missing data was handled and how it was 

7 bundled with the model. So it's more 

8 straightforward to fix up the missing data 

9 problem once through multiple imputation and use 

10 that as a data set. 

11 Q. What would be the acceptable methods of 

12 dealing with the missing data problem in NMES 

13 you have just described, the multiple 

14 imputation? 

15 A. Well, in the context of a particular 
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16 analysis that somebody had in mind, somebody 

17 could do a likelihood-based analysis, 

18 likelihood-based model, and that would be valid 

19 at least in large samples, under that model, 

20 under the model that's being used to analyze the 

21 data. 

22 Q. Any others which would be acceptable? 

23 A. In this context where there are patches 

24 of missing data all over the place and the 

25 problem is not really due to unit nonresponse, 
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1 it's hard to imagine any other general technique 

2 other than those two. The other one that comes 

3 to mind is weighting adjustment, and where you 

4 have these complex patterns of missing data, 

5 these adjustments sort of fall apart. So I 

6 don't think they would work here. 

7 Q. In the damage studies that you have 

8 reviewed that the plaintiffs in various states 

9 have done, what were the methods used to impute 

10 missing data in the NMES data? 

11 A. Well, some of the missing data in NMES 

12 were imputed by the agency that produced the 

13 data set and — 

14 Q. And that agency is? 

15 A. The Agency for — 

16 MR. BIERSTEKER: Health Care Policy 

17 and Research? 
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18 A. (continuing) A-H whatever it is. And I 

19 believe they did single imputation where they 

20 did it using a hot-deck procedure. 

21 Q. That's a government agency; right? 

22 A. Yes, that is, correct. 

23 Q. Why don't you describe a hot-deck 

24 procedure. 

25 A. What that does is, here's a person with 
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1 missing expenditures, let's say, and that person 

2 has an age, a gender, and an income. And you 

3 find somebody with the same age plus or minus 

4 ten years, the same gender — did I say income? 

5 — and the same income plus or minus $20,000. 

6 That's not what the hot deck really is like, but 

7 that's an easy way to describe it. And then 

8 grabs that person's value, this matching 

9 person's value of expenditures and drops it in. 

10 In fact, what the hot deck does, it 

11 doesn't do plus or minus, it makes cells, so 

12 genders are two levels of gender. They may have 

13 four levels of income, categories of income. 

14 And what was the other variable I used in my 

15 little — 

16 MR. BIERSTEKER: Age? 

17 A. (continuing) Age. And three levels of 

18 age, forming a two-by-four-by-three table, 24 

19 cells. And the nonrespondent, the guy with 

20 missing expenditures, falls in one of those 
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21 cells. And you grab someone who falls in the 

22 same cell with medical expenditures, and grab it 

23 and drop it in, and say, "That's exactly the 

24 right value." 

25 Q. What is a cold-deck procedure? 
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1 A. You go back to a preassigned value for 

2 somebody who is missing that particular value. 

3 So in other words, anyone who is missing 

4 expenditures and has these characteristics, some 

5 level of characteristics, gets a value that's 

6 been preassigned from the cold deck. The jargon 

7 is old. It goes back to days of the IBM cards 

8 where data was stored on IBM cards and the hot 

9 deck referred to the Current Survey. So you 

10 would find before the days of computers — those 

11 cards go back to the turn of the century, when 

12 they had the sorting machines. 

13 So you would take — they punched 

14 holes where he was missing something, and run 

15 the cards through, matching up the same 

16 expenditures. You would take, put the guy with 

17 missing data in, and he would have these little 

18 holes in his IBM card indicating his background 

19 characteristics, and you would wire the board to 

20 pick up those things. Then you would grab the 

21 card from the hot deck and put it in the card 

22 sorter, and those guys who matched those holes 
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23 where they were observed would drop out, and 

24 that would be the hot-deck values from which you 

25 would draw the imputation. 
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1 In the old days they had these cards 

2 in a particular order so you would grab the 

3 first one, census-track order. And the cold- 

4 deck idea was instead of using the hot deck, the 

5 real survey people, you grab a deck of cards 

6 made up before the survey was done on, where you 

7 would get the values to impute, based on 

8 matching them up. 

9 Q. In any event, some of the missing data 

10 in NMES was imputed by the agency that produced 

11 the NMES data using single-imputation hot-deck 

12 procedure? 

13 A. Correct. 

14 Q. Now, the plaintiff's reports in the 

15 various states that you have looked at, do they 

16 also impute some of the other missing data from 

17 NMES? 

18 A. Yes. 

19 Q. What techniques were used in those 

20 reports? 

21 A. In various reports sometimes they used 

22 best value prediction, based on a regression 

23 model. Sometimes they imputed arbitrary values 

24 in the sense that if somebody was missing 

25 "married," whether they were married or not. 
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they imputed them as single even though they had 
no idea. What other techniques did they use? 
Sometimes they did a random draw, a single 
imputation. 

Q. What do you mean, random draw? 

A. Instead of — suppose I'm trying to 
impute married/single, but you are missing 
married/single. They even did a model — they 
may have found a cell where there was both 
married and single people with the same 
background characteristics and took a random 
draw, like a random-draw hot deck. You find a 
guy who matches. So they did a version of hot 
deck. 

I think they may have done some 
modeling where they actually ran Probit, 
P-R-O-B-I-T, or Logit, L-O-G-I-T, regressions to 
predict some missing values, I think. And then 
you would take — there may have been some where 
they even imputed the probability. I don't 
remember now. 

But I think there were ones where, 
off the — from a model like that, they took a 
random draw with that probability to fill it 
in . 
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1 What other kinds of things did they 

2 do? Oh, effectively in — I guess that was in 

3 Minnesota, they built one of these effective 

4 selection models, which is what it tries to, but 

5 doesn't really adjust for unit nonresponse. 

6 This is between the NMES supplement and the 

7 people who didn't have the NMES supplement, 

8 where they were missing tobacco use. 

9 It's effectively an imputation 


10 

technique, although I don't think they used it 

11 

that way 

But they dealt with missing data in 

12 

that way 

• 

13 


Let's see. I ran out. There may be 

14 

other things that they did, but I haven't 

15 

thought 

about that for a while. 

16 

Q. 

Single-imputation hot-deck procedure. 

17 

is that 

a statistically valid method of imputing 

18 

missing 

data? 

19 

A. 

No. 

20 

Q. 

Was it at any time? 

21 

A. 

No. 

22 

Q. 

Never? 

23 

A. 

Never. Was it ever thought to be or 

24 

was it never? It was never valid. 

25 

Q. 

Was it ever generally accepted in the 
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1 field of statistics as statistically reliable? 


http://legacy.library.ucsfSdu/tiel/vtjiittpSaf0yO)^pdfindustrydocuments.ucsf.edu/docs/zjhd0001 



2 


MR. BIERSTEKER: Object to the 


3 f o rm. 

4 A. I don't believe so. I think if you read 

5 discussions of it, even from years ago, that 

6 people knew it had problems. They just hoped 

7 the problems weren't severe. You can find older 

8 literature. I'm thinking of guys who have died 

9 a while back. Bill Corcoran, an expert in survey 

10 sampling, or Morris Hanson, who was with the 

11 Census Bureau and one of the founders of Westat, 

12 a major consulting survey firm in 

13 Washington. 

14 They were aware of the problems. 

15 And I think if you look in the textbook written 

16 in 1953 by Hanson, Hurwitz and Madow, my memory 

17 is that there is discussion in there of the 

18 invalidity of standard errors and confidence 

19 intervals from doing hot-deck procedures and 

20 attempts to get valid standard errors by doing 

21 things like balanced repeated replication, BRR. 

22 So it's something that was known to be an issue 

23 for I believe close to half a century. 

24 Q. Single-imputation hot-deck procedures 

25 were generally used by statisticians; correct? 
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1 A. They were generally used by agencies to 

2 try to make the problem go away, and then they 

3 sort of worried about it a little bit and hoped 
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4 that — because nonresponse was not a major 

5 problem, this was just fixing up the minor 

6 things, two or three or maybe five percent 

7 missing data. 

8 There's a three-volume set that I 

9 co-authored one or two volumes, done by the 

10 National Academy of Sciences. It must have been 

11 published in the early 80's. But there was a 

12 chapter on hot deck, and I haven't read that for 

13 — I don't know — 15 years. But they were 

14 certainly aware of issues there, then, and how 

15 to get around the issues and get valid 

16 inferences. It was a topic. 

17 And one of the things that's 

18 changed, of course, over the years is 

19 computers. And so people then were trying to do 

20 things such that the data set could be done 

21 using sorting machines and very basic computing 

22 available. 

23 And the world has changed since 

24 then. Now there's another issue, the problem of 

25 nonresponse becoming more severe, and in 
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1 government surveys generally. There's a lot of 

2 concern about the amount of nonresponse in 

3 federal surveys that has been increasing, so the 

4 problem has become something that has demanded 

5 more attention. 

6 Another data set used in some of 
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7 these is NHANES. That's produced by the 

8 National Center for Health Statistics, and for 

9 many, many years they used a hot deck, and no 

10 longer. The NHANES that will be released — I 

11 was just talking to the people a week ago, and 

12 that version of NHANES, that is multiply 

13 imputed, not done by the hot deck but based on 

14 techniques that I have devised and helped them 

15 with, and former students of mine worked with. 

16 That will be the data set you will be able to 

17 get, and the current data set available from the 

18 National Center for Health Statistics for NHANES 

19 has no imputations at all. 

20 Their decision, based on seeing the 

21 problems that were created by single hot-deck 

22 imputation, the way they did for many years, the 

23 problems were more severe than they had 

24 realized, and they didn't want to do it that way 

25 anymore. 
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1 So the decision was, if you, the 

2 user of NHANES, are going to make a mistake with 

3 the missing data, it will be your mistake. It 

4 won't be due to having a hot-deck bias. 

5 Q. That's why you developed multiple 

6 imputation, because of the problem of the 

7 single-imputation hot-deck procedure? 

8 A. That was one of the motivating things. 


http://legacy.library.ucsfSdu/tiel/vtjiittpSaf0yO)^pdfindustrydocuments.ucsf.edu/docs/zjhd0001 



9 The first reason I did it, it was back in the 

10 mid-70's when I was working with the Social 

11 Security Administration. There was a new survey 

12 coming up, and they were worried about the 

13 nonresponse being more severe than in previous 

14 surveys. They wanted me to think about handling 

15 it. 

16 I had been doing things on missing 

17 data before that. I wrote a little paper, I 

18 think in 1976, sort of proposing multiple 

19 imputation as a way to think about it. That was 

20 the first thing that I wrote explicitly on 

21 multiple imputation. But it was trying to 

22 address the problem that everybody saw was 

23 coming up, missing data and how to deal with it 

24 in a valid way, primarily in federal surveys. 

25 Q. I infer from your answers with regard 
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1 to the single-imputation hot-deck procedure that 

2 the various methods you described that the 

3 plaintiffs in the states have used for 

4 imputation of missing data are likewise not 

5 statistically valid? 


6 

A. 

Correct. 

7 

Q. 

Is the best value based on regression 

8 

analysis 

a commonly used statistical method for 

9 

imputing 

missing values? 

10 

A. 

I hope not. 

11 

Q. 

Do you know? 
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12 


A. 


Do I know whether it's commonly used? 


13 It's not commonly used in the circles I deal 

14 with. 

15 Q. Which are — 

16 A. Which are the academics that I know who 

17 are at major universities, in both statistics 

18 and economics, the statistician types in 

19 economics. In the federal agencies that I deal 

20 with, if it's done, where the problems with it 

21 — I don't know anyone who does it in federal 

22 agencies. I don't know if it's common or not. 

23 What I'm hedging against is if you 

24 could find all these published articles where 

25 they are doing an analysis, still doing it. I 
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1 don't doubt it, but is it common? Not to my 

2 knowledge, and that's why I said I hope not. 

3 because I have been involved with quite a few 

4 federal agencies to try to improve the practice, 

5 and they are all aware of the problems of doing 

6 that. 

7 Q. Are Probit and Logit regressions 

8 commonly used to impute missing data? 

9 MR. BIERSTEKER: I will object to 

10 the form. 

11 A. Again, I don't know. I mean as a part 

12 of a technique to get multiple imputations, 

13 there's nothing wrong with it. It can be used 
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14 to impute a single imputation as well; there's 

15 something wrong with it. 

16 Q. The question wasn't whether there was 

17 something wrong with it. It's whether you know 

18 if was it commonly used. 

19 MR. BIERSTEKER: I renew the 

20 objection as to form. 

21 A. I don't know whether it's commonly used 

22 or not. As a piece of a modeling effort to 

23 multiply impute, I certainly know examples of 

24 it. Probit and Logit models being used to 

25 multiply impute. 
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1 Q. But always as part of multiple 

2 imputation? 

3 A. Those are the examples I am focused 

4 on. Are they commonly used for doing single 

5 imputation? It may well be. 

6 

7 (Recess taken.) 

8 

9 Q. If you will turn. Doctor Rubin, to page 

10 five of the report, which is page six of yours. 

11 A. Right. 

12 Q. You have some examples, and I'm looking 

13 at Example One on this page. And I assume that 

14 Example One represents the same individual, one 

15 in the factual world and one in the 

16 counterfactual world? 
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Exactly. 


17 

A. Exactly. 



18 

Q. In performing the 

modeling that 

you've 

19 

hypothecated, would it be 

necessary in this 

20 

case, how many different - 

- and I guess. 

is it 

21 

an age cohort? How many different — it 


22 

wouldn't be an age cohort. 

How many different 

23 

periods of time would you 

have to create 


24 

separate models for? 



25 

MR. BIERSTEKER 

: I object to 

the 
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1 f o rm. 

2 A. Well, in fact I would create one model 

3 for the whole period of time, so it would 

4 include time trends in the model. 

5 Q. And you would have within the time 

6 trends, you would have a group of smokers 

7 represented by those in the counterfactual world 

8 who have quit for one year, two years, three 

9 years, et cetera? 

10 A. Not quite. I would build a model, for 

11 example, for health care expenditures with data 

12 from the actual world. And then when talking 

13 about, when estimating the difference between 

14 medical expenditures in the factual and 

15 counterfactual world, I would bring in 

16 assumptions about change in smoking prevalence 

17 in the counterfactual world. And that would 

18 allow me to compute the extra health care costs 
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19 in the actual world beyond those in the 

20 counterfactual world without the alleged 

21 misconduct. 

22 Q. Then I guess it would be in the 

23 prevalence model where you would be determining 

24 on a year-by-year basis? 

25 A. Correct. 
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1 Q. And would you then have — in some 

2 fashion would you account for people who had not 

3 quit, people who had quit for one year, two 

4 years, three years, et cetera? 

5 A. Correct. All smoking behaviors that 

6 are considered relevant to health care costs. 

7 In other words, if the medical-epidemiological 

8 experts said there was no difference between a 

9 one- and two-year quitter with respect to 

10 health, then that's their call, and I wouldn't 

11 distinguish between those two kinds. If they 

12 said there is a difference, then there's a 

13 difference, and I have to consider that 

14 difference. 

15 Q. Do you know, with regard to that, what 

16 the epidemiologists say? 

17 A. No, I don't. 

18 Q. That wasn't a factor in your 

19 consideration? 

20 A. No, the factor in my consideration was 

21 just, if they say it's important, then I regard 
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24 

25 
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13 
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15 

16 
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20 
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23 


it as important. If they say it — the "they" 
is some community of people who are considered 
experts in that area. 

Q. An example I'm more curious about is 
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Example Two, which begins on page seven of yours 

A. Right. 

Q. You state in the introductory section. 
Example Two, about the third sentence, "For 
instance, a person may not be in the Medicaid 
population in the Factual World but could be in 
the Medicaid population in the Counterfactual 
World." And what factors would lead you to 
conclude that the person might be in Medicaid in 
the counterfactual world but not in the actual 
world? 

A. There are a variety of ways somebody 

can come in. One of them is illustrated by the 
example where the person in the Counterfactual 
World 1 lives longer and has heart surgery and 
bypass surgery and then goes into a nursing 
home. 

Another example, a kind of example, 
is illustrated by the following story. Let's 
suppose we have a person in the actual world in 
the Medicaid population with certain background 
characteristics and confounding factors and 
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24 smoking behaviors. So he has a list of those 

25 characteristics. And there's a person with 
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1 exactly those same characteristics in the actual 

2 world who doesn't have Medicaid expense, but he 

3 looks exactly the same. So he's a clone with 

4 respect to these characteristics that have been 

5 described as being all the important ones for 

6 determining medical expenditures, Medicaid 

7 expenditures. 

8 Well, in a counterfactual world, one 

9 could claim, since they are clones, the guy who 

10 didn't have Medicaid expenses this time, if you 

11 ran the world again, even without any changes, 

12 he would have some chance, some probability of 

13 having Medicaid expense, because the reason why 

14 his clone did and he did not is a matter of 

15 chance. 

16 So for example, let's suppose they 

17 both drive motorcycles, and the guy has Medicaid 

18 expenses because he was in a motorcycle accident 

19 and his clone who looks exactly the same as the 

20 other guy, also drives a motorcycle guy, didn't 

21 have Medicaid expenses this time. Rerun the 

22 world, he might very well. 

23 Q. How could you control for that example 

24 in a counterfactual model? 

25 A. It comes out of the mathematics. 
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1 There's a probability under a certain assumption 

2 that that person will have Medicaid expense. If 

3 you believe you have controlled for all 

4 important variables, it automatically pops out 

5 of the modeling. Somehow you have to consider, 

6 you have — I mean, you can assume it's not 

7 there just by fiat, say, "No, no, no, in the 

8 counterfactual world, even though he's a clone, 

9 I will just assert that he can't have expenses." 

10 And I'm not going to argue that that's not an 

11 assumption that can be made. But if you are 

12 going to make it, you have to make it. You 

13 can't sort of bury it somewhere. 

14 Q. Would it be reasonable to assume that 

15 in a counterfactual world, the probabilities are 

16 equivalent to the probabilities in the factual 

17 world, and you should get roughly approximately 

18 the same percentage of folks on Medicaid as not? 

19 A. In the — there are assumptions that 

20 would lead you to that, to reach that 

21 conclusion, yes. If you believe, for example, 

22 you have all the important factors, having 

23 determined having Medicaid expenses, then in a 

24 rerun world you have everybody who has the same 

25 combination of characteristics, that is, smoking 

MAHANEY REPORTING SERVICES 
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1 and background and confounding factors, someone 

2 who did have Medicaid in this world would have 

3 some probability of having Medicaid expenses in 

4 the counterfactual world, if you believe you 

5 have controlled for the important variables. 

6 Q. The example continues on the next page, 

7 and again we are looking at the same person in 

8 an actual world and two counterfactual worlds? 

9 A. Correct. 

10 Q. In the Counterfactual World 1 — which 

11 is the only example where there are Medicaid 

12 expenses, as far as I can tell; is that right? 

13 A. That's correct. Basically later I 

14 assume — well, the examples displayed on this 

15 page, it doesn't say whether they are Medicaid 

16 expenses or not. But on the bottom of this page 

17 it says, "If one assumes that none of this 

18 individual's health care costs prior to age 60 

19 were borne by Medicaid," so basically that's the 

20 working assumption. 

21 Q. And Counterfactual World 1 assumes no 

22 alleged misconduct; correct? 

23 A. Correct. 

24 Q. Can you explain to me how the Medicaid 

25 expenses in the Counterfactual World 1 are 
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1 related to the defendants' alleged misconduct? 

2 A. Related to? 
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Are they a result of the defendants' 


3 Q. 

4 alleged misconduct? 

5 A. Well, the lack of expenses in the 

6 factual world compared to the expenses in the 

7 counterfactual world are due to the alleged 

8 misconduct. In other words, these are negative 

9 expenses, now. 


10 

Q. 

I see that. My question is, in the 

11 

Counterfactual World 1, are the Medicaid 

12 

expenses 

; a consequence of the defendants' 

13 

alleged 

misconduct? 

14 


MR. BIERSTEKER: Object to the 

15 

form. 


16 

A. 

They are a consequence of the lack of 

17 

alleged 

misconduct. 

18 

Q. 

All right. In Counterfactual World 2, 

19 

where — 

- this is I guess what I have been 

20 

accused 

of referring to as the Twinkie example. 

21 

A. 

Twinkie? 

22 

Q. 

The person stops smoking, overeats 

23 

heavily. 

develops high cholesterol, has a heart 

24 

attack and dies. 

25 

A. 

I like the label. 



MAHANEY REPORTING SERVICES 

Tel. (617) 542-4207 



75 

1 

Q. 

The Twinkie defense is actually a 

2 

defense 

raised in some criminal cases where 


3 antisocial behavior has been associated with the 

4 level of blood sugar getting spiked. 
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5 


At any time in Counterfactual World 


2 is the overeating heavily a consequence of the 


defendants' alleged misconduct? 


MR. BIERSTEKER: Object to the 


form. 


A. It's a consequence of the lack of the 


alleged misconduct. 


Q. Are you familiar with the concept of 


ceteris paritus? 


A. Sort of. 


Q. What do you understand it to be? 


A. Everything else being equal, I would 


change nothing else. 


Q. Are you familiar with the concept? Do 


you know how the ceteris paritus concept is 


utilized in economic analysis of damages? 


A. Perhaps, perhaps not. I mean, I have a 


general understanding of how it's used in 


economics. 


Q. What is that understanding? 


A. For example, in the context of 


MAHANEY REPORTING SERVICES 
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regression models, you leave all the other 


values the same except the one you are 


interested in knowing the effect of, and that's 


the variable you change. 


For example, in the plaintiff's 


models such as Harrison in Oklahoma or the other 


ones, you change — you turn off smoking, to see 
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what the costs would be in a world without 


9 smoking but leave everything else the same. All 

10 the other factors that you have in the model are 

11 left set at their values as they exist, and the 

12 only thing you change is the smoking. 

13 Q. So using ceteris paritus as a guide 

14 here, we would turn off smoking, and we would 

15 not insert Twinkies or overeating heavily? 

16 A. Correct. But — that is an assumption 

17 one can make, but you don't have to make that 

18 assumption. There are more sophisticated 

19 assumptions that you can make. 

20 Q. On page eight, page nine on yours, 

21 three-fifths of the way down, there's a 

22 paragraph that states, "And of course, there 

23 exists another Counterfactual World, not 

24 displayed above, in which this individual's 

25 smoking behavior is unaffected by the alleged 
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1 misconduct, which automatically means that there 

2 would be no effect of the alleged misconduct on 

3 States' health care expenditures." 

4 Doctor Harris attempts to account 

5 for that, does he not? 

6 A. Yes, yes. 

7 Q. Is his attempt to account for that 

8 individual who does not change smoking behavior 

9 reliable in your estimation? 
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10 


A. 


No. 


11 Q. Why? 

12 A. Because of the way he does the 

13 calculations, for several reasons. We already 

14 spoke about the two major kinds of things. One 

15 is when he relies on actual data to come up with 

16 estimates that are — from real data sets. He 

17 didn't summarize them in the usual kinds of 

18 ways, is it statistically valid, giving point 

19 estimates, confidence intervals and standard 

20 errors, et cetera. 

21 Moreover, with respect to 

22 explication of assumptions, he doesn't explicate 

23 the assumptions in such a way that they are 

24 clear and explicit and convert to the necessary 

25 parts of a model. Moreover, he doesn't 
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1 condition these background characteristics 

2 anywhere. And instead, it's this filtering of 

3 information through his mind to come up with 

4 certain posited things that lead to combinations 

5 of assumptions about changing prevalence and 

6 relative risks that aren't sort of 

7 mathematically correct. That's not the way to 

8 combine those things. 

9 So there are a variety of reasons 

10 why that's not reliable. 

11 Q. With respect specifically to that 

12 point, the individuals who did not alter their 
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13 smoking behavior in the counterfactual world, 

14 can you identify any of the assumptions that 

15 Doctor Harris made? 

16 A. Well, he makes assumptions about quit 

17 rates, initiation rates. He makes assumptions 

18 about behavioral changes and bundles them 

19 together with relative risks, so that he has 

20 these behavior-modified relative risks changing 

21 in time. I assume that the question was 

22 directed at those kinds of issues. 

23 Q. And that wouldn't be limited just to 

24 those who don't quit smoking? It would apply to 

25 assumptions he made with regard to "smoked" with 
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1 "quit" or "never started"? 

2 A. Yes, correct. 

3 Q. And that there are the same assumptions 

4 about quit rates, initiation rates, behavioral 

5 changes, that he's assumed that he has bundled 

6 with relative risk? 

7 A. I'm not sure there are the same. Would 

8 you put the question differently? There are 

9 assumptions that he has made. 

10 Q. Those assumptions that you just 

11 described that Doctor Harris made would be 

12 common assumptions that he made not only with 

13 regard to those individuals in the 

14 counterfactual world who would not quit, but 
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15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 
17 


also those who would quit, or would not initiate 
in the counterfactual world? 

A. I believe the answer is yes, but it's a 
complex question. I'm not quite sure. 

Q. I think it's yes, too. Can you 
identify the source of Doctor Harris's 
assumption about quit rates in the 
counterfactual world? 

A. I think they are based on a variety of 
things that he — let me see. Quit rates? If I 
remember precisely where the quit rates came 
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from — it's not popping to mind right now where 
those assumptions come from. I think the 
context involves more information on 
advertising, but do I remember what the tables 
show? He has points in time where different 
things took place and rates are dropping. I 
don't remember precisely. 

Q. In order to do the modeling that you 
have just described, it would be necessary — 
one would have to make assumptions about quit 
rates; correct? 

A. Absolutely. 

Q. And assumptions about initiation rates? 

A. Absolutely. 

Q. And assumptions about behavioral 

changes? 

A. Yes. 
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And you try to find data about the 
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Q. 


responsiveness of consumers to information and 
how it affected those rates? 

A. Yes. 

Q. And you would premise your assumptions 

on that? 


A. 

clearly. 


But I would state the listing very 
those things, and how those initiation 
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rates, quit rates, would affect prevalence of 
all the different kinds — prevalences of all 
the different kinds of smoking behaviors that 
are regarded as being relevant. 

Q. When you say "all the different kinds 
of smoking behaviors," you are talking about 
not only smoking and not smoking, but you are 
also talking about having quit for any number of 
years? 

A. Correct. And how many packs a day you 
smoke and how many packs a day you smoked 
before, whether you are smoking low tar 
cigarettes or high tar cigarettes, filtered 
cigarettes or not. All the smoking behaviors 
that the medical types, the epidemiologists, 
think have medical consequences. If they think 
they have those consequences, then we have to 
consider the different prevalences of them in 
order to reach a reliable conclusion. 
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20 

Q. 

Now, turn to page nine, page ten on 

21 

yours, 

of your report. 

22 

A. 

Okay. 

23 

Q. 

Down where it says "Statistically, each 

24 

instance of defendants' alleged misconduct" — 

25 

A. 

Right. 
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1 Q. Paren, "specified by its character and 

2 its timing," close paren, "can be viewed as 

3 defining a level of a factor in a hypothetical 

4 factorial experiment." 

5 When you say, "each instance of 

6 defendants' alleged misconduct," what do you 

7 mean by that? 

8 A. Well, for example, the first example is 

9 the failure to market safer cigarettes. And if 

10 there is the claim, if he claims that that's 

11 misconduct, and then it should have happened, 

12 the claim is, at that date or another date or 

13 another date, then that could be four levels of 

14 that factor for the different dates. So the 

15 actual world is one level, and the three 

16 different levels of where the dates would be 

17 claimed could be, if that's the claim. 

18 The second factor is this example of 

19 information: they should have advertised the 

20 health risks. And if there is just one date 

21 being claimed where they should have done it, 

22 that's one value for that factor, versus the 
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23 actual world. But if the claim is that, if the 

24 consideration is it should have happened at this 

25 date or earlier date or earlier date, it would 
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1 be three levels for that in addition to the 

2 actual world. 

3 Q. Does it extend so far as to a 

4 consideration of each instance in which 

5 information should have been included in 

6 advertisements and was not? 

7 A. Only if the argument is that would lead 

8 to different prevalences or different kinds of 

9 smoking. Again, for each distinct kind of 

10 behavior on the part of the defendant, that 

11 would lead to a change in prevalence, if that's 

12 the claim, then if we are going to consider 

13 those things separately, then we have to 

14 consider them separately. 

15 If the plaintiff didn't want to 

16 consider them separately, just wanted to 

17 consider one, then that's all. I am playing a 

18 role as a statistician. If you are going to say 

19 that these three things would have distinct 

20 effects, if that's what you are claiming, then 

21 we have to consider them as having distinct 

22 effects and consider them. We can't bundle them 

23 all together and forget about the fact you are 

24 making that claim. 
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Q. 


Does Doctor Harris's analysis include 
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examples of when safer cigarettes would have 
been marketed in a counterfactual world? 

A. Yes, he has a couple of examples, I 
think, of — let's suppose he — he moves the 
dates back about when they would have been, and 
so that he — and he bundles that into the 
relative risks of smoking and how that would 
have changed. 

Q. Is the problem with him doing that, 
part of the problem, bundled into the relative 
risk? 

A. That's part of the problem. The 
calculations aren't done correctly. The 
information he's bringing toward the problem — 
the kinds of information he's bringing toward 
the problem, I don't object to, really, that, 
the claim that safer cigarettes would have been 
introduced here and would have happened at this 
particular date, and if that had happened, we 
would have seen changes in prevalence in smoking 
high tar cigarettes versus safer cigarettes, and 
this would be transpired in time. That would be 
explicit, and reasons given for why that is 
true, and statements that this changed 
prevalence. 

MAHANEY REPORTING SERVICES 
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1 Q. And that far, he did do that; correct? 

2 A. He did it, but he's got it — he 

3 bundles it in with relative risk rather than the 

4 prevalence, and the calculations just 

5 technically don't work out that way. The 

6 averaging he does doesn't work out. It's not 

7 the right way to combine those things. He is 

8 calculating averages at the wrong level. 

9 Q. Explain to me what level he should 

10 calculate at. 

11 A. If — let's start with the — pretend 

12 we have a list of say ten distinct smoking 

13 behaviors that have associated with them 

14 different relative risks. One would be no 

15 smoking, one would be smoking low tar 

16 cigarettes, the safer cigarettes that should 

17 have been introduced earlier. Another smoking 

18 behavior would be former smoker, quit ten years 

19 ago, five years ago. Another, one pack a day 

20 unfiltered cigarettes, whatever the list of 

21 smoking behaviors that Doctor Harris and other 

22 medical people and epidemiologists think have 

23 distinct — have distinct consequences on 

24 medical expenditures. 

25 So we have a list of them. For 
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1 example, there's ten. And for those ten, let's 

2 suppose we have the relative risks, defined by 

3 subpopulations of background characteristics and 

4 confounding factors or conditioned on relative 

5 risks in terms of medical expenditure. 

6 If we have those relative risks, now 

7 let's suppose we have the prevalences of those 

8 smoking behaviors in the actual world, also as a 

9 function of background characteristics and 

10 confounding factors. Now, what we have to do is 

11 make assumptions to get at what the prevalence 

12 of those smoking behaviors would be in the 

13 counterfactual world without the particular 

14 instances of alleged misconduct, also as 

15 function of background characteristics and other 

16 confounding factors. 

17 And at that point Harris could just, 

18 I think quite eloquently, from what I have read, 

19 say what those are in the — in the 

20 counterfactual world without these alleged acts 

21 of misconduct, why he thinks there would be the 

22 prevalences of those behaviors. Other people 

23 might not agree with him that those would be the 

24 consequences, but that's — presumably the 

25 defendants would say no, no, the alleged 
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1 misconduct would not have that dramatic effect. 

2 But that's not my call. 

3 Now at that point, once he specified 
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4 those relative risks as a function of background 

5 characteristics, confounding factors, those 

6 actual prevalences as a function of background 

7 characteristics, other confounding factors and 

8 counterfactual prevalences of those behaviors, 

9 smoking behaviors as a function of background 

10 characteristics and other confounding factors, 

11 then he is done. He's done his job in the sense 

12 of, the model takes over. You combine those 

13 things and you get an answer for how much extra 

14 medical expenses there are. 

15 But that's not what he did. He sort 

16 of put them all in a bowl, these three distinct 

17 things, and stirred them up and applied a very 

18 simple formula that doesn't really work, and got 

19 an answer. 

20 Does that help? 

21 Q. Yes. And because he stirred them up 

22 that way rather than applying the correct 

23 formula, it's not reliable, and it's not valid? 

24 He was trying to account for what he believes is 

25 a change in the toxicity of cigarettes; correct? 

MAHANEY REPORTING SERVICES 
Tel. (617) 542-4207 

88 

1 A. Correct. But that really is a change. 

2 I mean, those different kinds of cigarettes are 

3 out there as smoking behaviors, and what's 

4 changing is the prevalence of people smoking the 

5 low toxicity versus high toxicity cigarettes. 
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And combining them the way he did, it doesn't 


7 work out technically. 

8 Q. So with regard to the toxicity, the way 

9 to properly do that would be to trend over time 

10 in the actual world, the various smoking 

11 populations as defined by relevant 

12 characteristics of those who have switched to 

13 the less toxic products? 

14 A. Yes, people who have switched, because 

15 those are different behaviors. That's right. 

16 Model the effect of that smoking behavior on 

17 medical outcomes in the actual world. And then 

18 you can say — and then in the counterfactual 

19 world you have your prevalence change from the 

20 actual world in those smoking behaviors. 

21 And if the data are thin, it makes 

22 it harder to do it. I can understand that data 

23 can be thin. Now, when the data become thin, 

24 you state what your assumptions are, so other 

25 people can see how you got the answer. You just 
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1 don't stir it up and pull an answer out. 

2 Q. I don't want to mischaracterize. I 

3 thought you said that you had no problem with 

4 the information he was using about safer 

5 cigarettes and when they would be introduced; 

6 that your problem was the mixing in the models. 

7 A. Let me clarify. I do have some 

8 problems with the information. I don't have a 
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9 problem with his general flow. But the problem 

10 I have with the information, I will go back to 

11 two things. First of all, the studies that he 

12 is relying on for that information, I didn't see 

13 how the statistics were summarized adequately. 

14 Also, the information he's relying 

15 on isn't conditional on these background 

16 characteristics and other confounding factors 

17 which formally have to be in there. So when I 

18 am saying I don't have a problem with 

19 information he is relying on, it's more what he 

20 is attempting to do. He's attempting to do it 

21 at a much too coarse a level without explication 

22 of where he gets things. And then the way 

23 pieces are bundled together isn't correct. 

24 So it's, when you say a different 

25 way, he clearly has a tremendous amount of 
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1 knowledge about the area and about the pieces 

2 that are relevant. But how to get those pieces 

3 functioning, these confounding factors, and 

4 combine all these pieces that he is interested 

5 in, in the right way, is something that he's not 

6 doing. 

7 Q. In that same paragraph I was quoting, 

8 towards the bottom of page nine, "Issues of 

9 additive versus synergistic effects of the 

10 alleged acts of misconduct would be addressed by 
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11 posited interaction in the hypothetical 

12 experiment," what does that sentence mean? 

13 A. It means that — let's consider the two 

14 examples above of alleged misconduct, safer 

15 cigarettes and information. If the — let's 

16 suppose we said somebody decided, say the court 

17 decided that there was no misconduct due to 

18 advertising, so we turn that factor off. And 

19 then there's an amount that would be due just to 

20 the failure to introduce safer cigarettes, an 

21 amount of dollars. 

22 Now let's suppose the court decides 

23 the other way, that safer cigarettes weren't an 

24 act of misconduct, but the advertising was. So 

25 we turn off the safer cigarettes and see what 
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1 happens to advertising, and we get a pot of 

2 money there. So we have two pots of money. 

3 And now let's say the court decides 

4 they are both acts of misconduct. If there is 

5 no synergism, no interaction, you get the answer 

6 when you leave the — you get the correct pot of 

7 money where they are both acts of misconduct by 

8 adding these two separate pieces together. That 

9 means there's no interaction. 

10 Q. I don't suppose you are familiar with 

11 the Washington rules of evidence? 

12 A. I don't believe so. Do you want to 

13 educate me? 
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14 


Q. 


No. If the act of misconduct alleged 


15 by the state is a conspiracy to restrain 

16 competition on the basis of health performance 

17 of cigarettes, then is it necessary to make this 

18 — to deal with issues of additive versus 

19 synergistic effects? 

20 A. Well, I guess if — let me start 

21 again. Since these individual acts of alleged 

22 misconduct are used by Doctor Harris to draw 

23 conclusions about what would have happened in 

24 their absence, it seems to me — okay. And if 

25 he can make an argument that the whole entire 
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1 effect on them would have this effect on 

2 prevalence of all these smoking behaviors, and 

3 if there's no doubt that no one cares about 

4 anything other than all the acts there or all 

5 the acts of alleged misconduct not there, then 

6 you don't have to worry about it. Although it 

7 might make it easier for someone else to try to 

8 assess how he got to his counterfactual world 

9 prevalence if they understood the logic flowing 

10 from each one. 

11 But that's not a statistician's 

12 call. That's sort of in some sense a 

13 scientist's call if they can follow his logic 

14 and say yes, I believe those prevalences in the 

15 counterfactual would be just as you described 
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16 them, because you have got all these things 

17 going at the same time. 

18 Q. Or perhaps a jury's? 

19 A. Or a jury's, correct, whoever makes the 

20 call. But even for a jury, I would think it 

21 would be easier if they are unbundled. But 

22 that's not my business. 

23 Q. Please turn to page 11 of the report, 

24 page 12 of yours. The first full paragraph 

25 there is critical of Doctor Harris's discussion 
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1 of safer products. As you state in the third 

2 sentence, "But in his report. Doctor Harris 

3 never specifies what products would have been 

4 introduced at what time, how much 'safer' those 

5 products would have been than products already 

6 on the market at that time, nor at what rate 

7 consumers would have adopted them." 

8 Is that a correct statement? I 

9 thought Doctor Harris specified products. As 

10 you said, he gave some examples. 

11 A. I don't remember specifically. There 

12 were certainly discussions of some. Maybe the 

13 sentence is a compound sentence, never specifies 

14 this, what products would have been introduced 

15 at what time. I think that's, as I read it now, 

16 I think that's what I meant to say: which 

17 products would have been introduced at what 

18 time. 
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19 


Q. 


He identifies some example products and 


20 gives several times when those would have been 

21 introduced under various assumptions; correct? 

22 MR. BIERSTEKER: Let me, just for a 

23 point of clarification — maybe it doesn't help 

24 at all, and if so, I will shut up. But I think 

25 we need to be careful when we are talking about 
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1 this to distinguish between Doctor Harris's 

2 report in January of 1998, which was all that 

3 was available when Doctor Rubin wrote his 

4 report, and then the subsequent declaration 

5 which came out in June of 1998. 

6 MR. FERGUSON: I am actually talking 

7 about the January report, too. But the 

8 clarification does help. 

9 A. Certainly the declaration had a lot 

10 more detail in it about these kinds of reasoning 

11 than the report did. 

12 Q. In the January report Doctor Harris 

13 looks at a product code-named Ariel? 

14 A. I remember that. 

15 Q. He looks at nicotine analogs. And at 

16 least with the Ariel-type product, I think he 

17 hypothecated at least two dates when that 

18 product would have been introduced in a 

19 competitive market. Am I remembering that the 

20 same way you are? 
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21 


A. 


I now do remember that, and I'm not 


22 sureition of Stanley Roberts in the Washington 

18 case? 

19 A. In the Washington case, did I look at 

20 that? I think I looked at one in Oklahoma. I 

21 don't remember whether I did or not in 

22 Washington. 

23 Q. When we were talking about imputing 

24 missing values, you identified for me some of 

25 the methods that had been used in the various 
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1 state cases to do that, and I see on page 18 — 

2 19 on yours — that you describe some of these 

3 in that middle paragraph there. The best 

4 predicted value, you told me about that. 

5 "Sequentially without proper conditioning," 

6 what does that mean? 

7 A. What that refers to is the following 

8 important idea. When imputing, let's say 

9 missing expenditures, you have to condition 

10 formally on all the other variables in the data 

11 set, not just a couple of them that happen to be 

12 in the hot deck, for instance. Formally you 

13 have to condition on all the variables. And if 

14 you don't do that, you get biased estimates. 

15 You get the wrong answer. Even the point 

16 estimate is wrong. 

17 It's not just the standard errors 

18 are wrong. You are getting the wrong answer. 
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19 Not just incorrect variability, systematically 

20 the wrong answer. You are systematically 

21 underestimating relationships, and therefore you 

22 are — in doing some kinds of modeling, using a 

23 data set imputed that way, you get the wrong 

24 answer for coefficients. 

25 Q. How did Doctor Harrison impute missing 
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1 values; do you recall? 

2 A. I don't think he did impute them, 

3 missing values. Did he analyze a particular 

4 data set by himself? 

5 Q. Harrison. 

6 A. I'm sorry, Harrison. My apology. 

7 Harrison imputed arbitrary values. 

8 Q. Do you have any understanding of how he 

9 arbitrarily chose them? 

10 A. Well, for example, I believe in the 

11 married question, he imputed you as being single 

12 if you were missing, and there were a variety of 

13 other variables that he just imputed arbitrary 

14 values for. In his report he says, "That gets 

15 rid of the problem of missing data because we 

16 will just arbitrarily fill something in, and 

17 then we don't have to worry about missing data." 

18 And he does a complete case 

19 analysis. He throws away a part of the data set 

20 where people don't have certain variables that 
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21 he, even after doing this arbitrary filling in, 

22 he still has other values that are missing, and 

23 he just throws those people way. And it turns 

24 out the people he throws away are distinctly 

25 different from the people he keeps in, a 
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1 well-known source of bias. 

2 Q. If a sample is random and there is 

3 missing data and someone wants to do a complete 

4 case analysis — is that the term? 

5 A. Correct. 

6 Q. One wants to do a complete case 

7 analysis, so we are going to throw out 

8 nonresponders, and the sample is random. Is 

9 that a problem? 

10 A. Yes. 

11 Q. Why? 

12 A. Let's be sure we mean the same thing by 

13 "complete case." That means you retain only 

14 those people who have every available thing you 

15 are interested in to observe; you throw out 

16 people who have something missing. If the people 

17 who have something missing are only a random 

18 subset of the original sample, then all you do 

19 is lose some efficiency, but it's not biased. 

20 If people who don't produce complete 

21 data you want to observe are systematically 

22 different from those who do produce complete 

23 data, then the result of analyzing just the 
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24 complete reporters is a biased analysis. 

25 Q. What you are suggesting is that in 
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1 essence, nonresponders are not a random 

2 selection? 

3 A. Correct. But the way you stated your 

4 question — let's suppose you start with the 

5 random sampling. If you said — let's suppose 

6 we start with a random sample, and further, some 

7 people with some missing data are a random 

8 sample from that random sample. Then all you do 

9 is lose some efficiency. 

10 Q. How does one determine randomness of 

11 the subset of nonresponders? 

12 A. You would look at the set of variables 

13 that are fully observed for everybody, for 

14 example, respondents and nonrespondents. If 

15 it's a random sample, the distribution of those 

16 variables that are fully observed will be the 

17 same for the nonresponders and the responders. 

18 Not so in NMES for the data set 

19 Harrison analyzes. They are dramatically 

20 different. 

21 Q. How are they different? 

22 A. Do I remember the individual 

23 variables? I have got a table in my Oklahoma 

24 report that shows that, that shows that they are 

25 dramatically different. Do I remember the 
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individual variables on which they are most 
different? I wish I did right now. 

Again, this is something like the 
other analysis that I would rather not guess at, 
but it's a table in the Oklahoma report. 

Let me give you the flavor. This is 
not correct. It might be correct; it might not 
be. But, for example, my memory is that they 
differ on income level, I believe. 

Q. They couldn't afford a stamp, or 
something? 

A. Well, there are — I believe there's a 
long literature with — not long. There is some 
literature that suggests that income does relate 
to people's willingness to respond. I don't 
remember which way it went in this. There may 
be a racial difference as well, maybe an 
employee difference as well. 

I don't remember which way it goes, 
but perhaps those who are employed feel they 
have less time to deal with all the questions. 

I don't know. But there is a table that 
documents that. 

Q. If you were conducting a damage study 
in a case such as this and building the models 
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that you have described would be necessary, 
would you conduct any efforts to collect 
additional data beyond what already exists? 


A. 

Yes, 

I would. 

Q. 

What 

would you do? 

A. 

Well, 

let's suppose that we have the 


NMES Medicaid group as it is now. For example, 
one of the things that would be nice to 
supplement that with, in fact essential if you 
are going to build these valid, allegedly 
reliable models, would be the distribution of 
these background characteristics and other 
confounding factors and smoking in the 
population for which damages are being sought. 

Q. So you would want to know the 
distribution of those background 
characteristics? 

A. In the group of the people for whom 
damages are being sought. 

Q. And smoking behavior in the Washington 
Medicaid population? 

A. And how those relate to actual 
expenses. 

Q. And you would do what, some kind of 
survey? 
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A. Yes, I would do a survey of Medicaid 
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2 recipients, and then on the basis of that survey 

3 and then matching that data to actual health 

4 expenditures for each person in the survey, 

5 relate those background characteristics to 

6 health expenditures. 

7 Q. When on your last page you say that, 

8 "as documented in my January 1998 Supplemental 

9 Report in Minnesota, the differences between 

10 smokers and nonsmokers in NMES are so 

11 substantial that the attempted adjustments using 

12 regression models are known to be entirely 

13 unreliable," that's what we were talking about 
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